[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xmlbeans-user
Subject:    RE: Cannot encode my XML document output into UTF-8
From:       "Radu Preotiuc-Pietro" <radup () bea ! com>
Date:       2006-08-30 16:54:18
Message-ID: 99479F4D39C9244F8E17E688193A3DD8BD7CB1 () repbex02 ! amer ! bea ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

I couldn't find the time to look at this in detail, but here's a suggestion that may \
help:  
TextPad (like Notepad) I think looks at the first bytes in the file and if it sees \
something like FF FE decides that the encoding is unicode. But your file being XML, \
it relies on the encoding="UTF-8" part to set the encoding to UTF-8 and doesn't use \
the bytes, which TextPad doesn't pick up. So in other words, I think you're fine. Try \
putting some non-ASCII chars in your file, open it in TextPad and then set the \
encoding manually to UTF-8 and check if the characters are the same.  
The main idea in this story is that there is no "standard" mechanism to decide if a \
set of bytes are text in UTF-8 encoding or in ASCII encoding or a JPEG image (that's \
why XML needed an "encoding" attribute by the way). So as long as you have rules and \
mechanisms to ensure that the same encoding is used throughout your system, you are \
ok .  
Radu

________________________________

From: Michael White [mailto:whitemichael@gmail.com] 
Sent: Friday, August 18, 2006 2:52 PM
To: user@xmlbeans.apache.org
Subject: Cannot encode my XML document output into UTF-8


I can't properly encode my XML output file and would appreciate any help you could \
offer!

For example, if I do the following:

<<
    ByteArrayOutputStream bos = new ByteArrayOutputStream();

    FileOutputStream fos = new FileOutputStream("C:/test.xml"); 
    PrintStream xmlStream = new PrintStream(fos, false, "UTF-8");    
       
    XmlOptions printOptions = new XmlOptions();
    printOptions.setSavePrettyPrint();
    printOptions.setSavePrettyPrintIndent (2);
    printOptions.setUseDefaultNamespace();
    printOptions.setCharacterEncoding("UTF-8");

    paymentDoc.save(bos,printOptions);
    xmlStream.print(bos);   //xmlStream.print(bos.toString("UTF-8")); 
    xmlStream.close();
> > 

I receive a properly formatted file, with all of the data I require.  However, per \
textpad, the encoding is set to ANSI.  I've tried numerous combinations of writers \
and encoding and can't seem to get the output into UTF-8!  I'll be dealing with \
Japanese and Korean characters so it is a necessity. 

The crazy part is that if I perform the following:

<<
ByteArrayOutputStream bos = new ByteArrayOutputStream();

FileOutputStream fos = new FileOutputStream("C:/test.xml");
PrintStream xmlStream = new PrintStream(fos, false, "UTF-8");    

bos.write("A?u$(He933u3'u(BaÌ3̇".getBytes("UTF-8"));
xmlStream.print(bos);
xmlStream.close();
> > 

The resulting file is listed as properly encoded in UTF-8 format!?

I'm at my wits end.  I'm using the latest XmlBeans release as of today and JDK \
1.4.2_12.  I set the documentProperties encoding to UTF-8 as well and it just doesn't \
want to play nice.

Help!

Thanks, Mike

_______________________________________________________________________
Notice:  This email message, together with any attachments, may contain
information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated
entities,  that may be confidential,  proprietary,  copyrighted  and/or
legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient,
and have received this message in error, please immediately return this
by email and then delete it.


[Attachment #3 (text/html)]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
<META content="MSHTML 6.00.2800.1561" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial 
color=#0000ff size=2>I couldn't find the time to look at this in detail, but 
here's a suggestion that may help:</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial 
color=#0000ff size=2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial 
color=#0000ff size=2>TextPad (like Notepad) I think looks at the first bytes in 
the file and if it sees something like FF FE decides that the encoding is 
unicode. But your file being XML, it relies on the encoding="UTF-8" part to set 
the encoding to UTF-8 and doesn't use the bytes, which TextPad doesn't pick up. 
So in other words, I think you're fine. Try putting some non-ASCII chars in your 
file, open it in TextPad and then set the encoding manually to UTF-8 and check 
if the characters are the same.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial 
color=#0000ff size=2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial 
color=#0000ff size=2>The main idea in this story is that there is no "standard" 
mechanism to decide if a set of bytes are text in UTF-8 encoding or in ASCII 
encoding or a JPEG image (that's why XML needed an "encoding" attribute by the 
way). So as long as you have rules and mechanisms to ensure that the same 
encoding is used throughout your system, you are ok .</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial 
color=#0000ff size=2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=ltr align=left><SPAN class=461464016-30082006><FONT face=Arial 
color=#0000ff size=2>Radu</FONT></SPAN></DIV><BR>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Michael White 
[mailto:whitemichael@gmail.com] <BR><B>Sent:</B> Friday, August 18, 2006 2:52 
PM<BR><B>To:</B> user@xmlbeans.apache.org<BR><B>Subject:</B> Cannot encode my 
XML document output into UTF-8<BR></FONT><BR></DIV>
<DIV></DIV>I can't properly encode my XML output file and would appreciate any 
help you could offer!<BR><BR>For example, if I do the 
following:<BR><BR>&lt;&lt;<BR>&nbsp;&nbsp;&nbsp; ByteArrayOutputStream bos = new 
ByteArrayOutputStream();<BR><BR>&nbsp;&nbsp;&nbsp; FileOutputStream fos = new 
FileOutputStream("C:/test.xml"); <BR>&nbsp;&nbsp;&nbsp; PrintStream xmlStream = 
new PrintStream(fos, false, "UTF-8");&nbsp;&nbsp;&nbsp; 
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <BR>&nbsp;&nbsp;&nbsp; XmlOptions 
printOptions = new XmlOptions();<BR>&nbsp;&nbsp;&nbsp; 
printOptions.setSavePrettyPrint();<BR>&nbsp;&nbsp;&nbsp; 
printOptions.setSavePrettyPrintIndent (2);<BR>&nbsp;&nbsp;&nbsp; 
printOptions.setUseDefaultNamespace();<BR>&nbsp;&nbsp;&nbsp; 
printOptions.setCharacterEncoding("UTF-8");<BR><BR>&nbsp;&nbsp;&nbsp; 
paymentDoc.save(bos,printOptions);<BR>&nbsp;&nbsp;&nbsp; 
xmlStream.print(bos);&nbsp;&nbsp; //xmlStream.print(bos.toString("UTF-8")); 
<BR>&nbsp;&nbsp;&nbsp; xmlStream.close();<BR>&gt;&gt;<BR><BR>I receive a 
properly formatted file, with all of the data I require.&nbsp; However, per 
textpad, the encoding is set to ANSI.&nbsp; I've tried numerous combinations of 
writers and encoding and can't seem to get the output into UTF-8!&nbsp; I'll be 
dealing with Japanese and Korean characters so it is a necessity. <BR><BR>The 
crazy part is that if I perform the 
following:<BR><BR>&lt;&lt;<BR>ByteArrayOutputStream bos = new 
ByteArrayOutputStream();<BR><BR>FileOutputStream fos = new 
FileOutputStream("C:/test.xml");<BR>PrintStream xmlStream = new PrintStream(fos, 
false, "UTF-8");&nbsp;&nbsp;&nbsp; 
<BR><BR>bos.write("A?u$(He933u3'u(BaÌ3̇".getBytes("UTF-8"));<BR>xmlStream.print(bos);<BR>xmlStream.close();<BR>&gt;&gt;<BR><BR>The \
 resulting file is listed as properly encoded in UTF-8 format!?<BR><BR>I'm at my 
wits end.&nbsp; I'm using the latest XmlBeans release as of today and JDK 
1.4.2_12.&nbsp; I set the documentProperties encoding to UTF-8 as well and it 
just doesn't want to play nice.<BR><BR>Help!<BR><BR>Thanks, 
Mike<BR></BODY></HTML>
<PRE>_______________________________________________________________________
Notice:  This email message, together with any attachments, may contain
information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated
entities,  that may be confidential,  proprietary,  copyrighted  and/or
legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient,
and have received this message in error, please immediately return this
by email and then delete it.
</PRE>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic