[prev in list] [next in list] [prev in thread] [next in thread]
List: xmlbeans-user
Subject: RE: Cannot encode my XML document output into UTF-8
From: "Radu Preotiuc-Pietro" <radup () bea ! com>
Date: 2006-08-30 16:54:18
Message-ID: 99479F4D39C9244F8E17E688193A3DD8BD7CB1 () repbex02 ! amer ! bea ! com
[Download RAW message or body]
[Attachment #2 (text/plain)]
I couldn't find the time to look at this in detail, but here's a suggestion that may \
help:
TextPad (like Notepad) I think looks at the first bytes in the file and if it sees \
something like FF FE decides that the encoding is unicode. But your file being XML, \
it relies on the encoding="UTF-8" part to set the encoding to UTF-8 and doesn't use \
the bytes, which TextPad doesn't pick up. So in other words, I think you're fine. Try \
putting some non-ASCII chars in your file, open it in TextPad and then set the \
encoding manually to UTF-8 and check if the characters are the same.
The main idea in this story is that there is no "standard" mechanism to decide if a \
set of bytes are text in UTF-8 encoding or in ASCII encoding or a JPEG image (that's \
why XML needed an "encoding" attribute by the way). So as long as you have rules and \
mechanisms to ensure that the same encoding is used throughout your system, you are \
ok .
Radu
________________________________
From: Michael White [mailto:whitemichael@gmail.com]
Sent: Friday, August 18, 2006 2:52 PM
To: user@xmlbeans.apache.org
Subject: Cannot encode my XML document output into UTF-8
I can't properly encode my XML output file and would appreciate any help you could \
offer!
For example, if I do the following:
<<
ByteArrayOutputStream bos = new ByteArrayOutputStream();
FileOutputStream fos = new FileOutputStream("C:/test.xml");
PrintStream xmlStream = new PrintStream(fos, false, "UTF-8");
XmlOptions printOptions = new XmlOptions();
printOptions.setSavePrettyPrint();
printOptions.setSavePrettyPrintIndent (2);
printOptions.setUseDefaultNamespace();
printOptions.setCharacterEncoding("UTF-8");
paymentDoc.save(bos,printOptions);
xmlStream.print(bos); //xmlStream.print(bos.toString("UTF-8"));
xmlStream.close();
> >
I receive a properly formatted file, with all of the data I require. However, per \
textpad, the encoding is set to ANSI. I've tried numerous combinations of writers \
and encoding and can't seem to get the output into UTF-8! I'll be dealing with \
Japanese and Korean characters so it is a necessity.
The crazy part is that if I perform the following:
<<
ByteArrayOutputStream bos = new ByteArrayOutputStream();
FileOutputStream fos = new FileOutputStream("C:/test.xml");
PrintStream xmlStream = new PrintStream(fos, false, "UTF-8");
bos.write("A?u$(He933u3'u(BaÌ3̇".getBytes("UTF-8"));
xmlStream.print(bos);
xmlStream.close();
> >
The resulting file is listed as properly encoded in UTF-8 format!?
I'm at my wits end. I'm using the latest XmlBeans release as of today and JDK \
1.4.2_12. I set the documentProperties encoding to UTF-8 as well and it just doesn't \
want to play nice.
Help!
Thanks, Mike
_______________________________________________________________________
Notice: This email message, together with any attachments, may contain
information of BEA Systems, Inc., its subsidiaries and affiliated
entities, that may be confidential, proprietary, copyrighted and/or
legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient,
and have received this message in error, please immediately return this
by email and then delete it.
[Attachment #3 (text/html)]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
<META content="MSHTML 6.00.2800.1561" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial
color=#0000ff size=2>I couldn't find the time to look at this in detail, but
here's a suggestion that may help:</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial
color=#0000ff size=2>TextPad (like Notepad) I think looks at the first bytes in
the file and if it sees something like FF FE decides that the encoding is
unicode. But your file being XML, it relies on the encoding="UTF-8" part to set
the encoding to UTF-8 and doesn't use the bytes, which TextPad doesn't pick up.
So in other words, I think you're fine. Try putting some non-ASCII chars in your
file, open it in TextPad and then set the encoding manually to UTF-8 and check
if the characters are the same.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial
color=#0000ff size=2>The main idea in this story is that there is no "standard"
mechanism to decide if a set of bytes are text in UTF-8 encoding or in ASCII
encoding or a JPEG image (that's why XML needed an "encoding" attribute by the
way). So as long as you have rules and mechanisms to ensure that the same
encoding is used throughout your system, you are ok .</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial
color=#0000ff size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial
color=#0000ff size=2>Radu</FONT></SPAN></DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Michael White
[mailto:whitemichael@gmail.com] <BR><B>Sent:</B> Friday, August 18, 2006 2:52
PM<BR><B>To:</B> user@xmlbeans.apache.org<BR><B>Subject:</B> Cannot encode my
XML document output into UTF-8<BR></FONT><BR></DIV>
<DIV></DIV>I can't properly encode my XML output file and would appreciate any
help you could offer!<BR><BR>For example, if I do the
following:<BR><BR><<<BR> ByteArrayOutputStream bos = new
ByteArrayOutputStream();<BR><BR> FileOutputStream fos = new
FileOutputStream("C:/test.xml"); <BR> PrintStream xmlStream =
new PrintStream(fos, false, "UTF-8");
<BR> <BR> XmlOptions
printOptions = new XmlOptions();<BR>
printOptions.setSavePrettyPrint();<BR>
printOptions.setSavePrettyPrintIndent (2);<BR>
printOptions.setUseDefaultNamespace();<BR>
printOptions.setCharacterEncoding("UTF-8");<BR><BR>
paymentDoc.save(bos,printOptions);<BR>
xmlStream.print(bos); //xmlStream.print(bos.toString("UTF-8"));
<BR> xmlStream.close();<BR>>><BR><BR>I receive a
properly formatted file, with all of the data I require. However, per
textpad, the encoding is set to ANSI. I've tried numerous combinations of
writers and encoding and can't seem to get the output into UTF-8! I'll be
dealing with Japanese and Korean characters so it is a necessity. <BR><BR>The
crazy part is that if I perform the
following:<BR><BR><<<BR>ByteArrayOutputStream bos = new
ByteArrayOutputStream();<BR><BR>FileOutputStream fos = new
FileOutputStream("C:/test.xml");<BR>PrintStream xmlStream = new PrintStream(fos,
false, "UTF-8");
<BR><BR>bos.write("A?u$(He933u3'u(BaÌ3̇".getBytes("UTF-8"));<BR>xmlStream.print(bos);<BR>xmlStream.close();<BR>>><BR><BR>The \
resulting file is listed as properly encoded in UTF-8 format!?<BR><BR>I'm at my
wits end. I'm using the latest XmlBeans release as of today and JDK
1.4.2_12. I set the documentProperties encoding to UTF-8 as well and it
just doesn't want to play nice.<BR><BR>Help!<BR><BR>Thanks,
Mike<BR></BODY></HTML>
<PRE>_______________________________________________________________________
Notice: This email message, together with any attachments, may contain
information of BEA Systems, Inc., its subsidiaries and affiliated
entities, that may be confidential, proprietary, copyrighted and/or
legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient,
and have received this message in error, please immediately return this
by email and then delete it.
</PRE>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic