[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-j-user
Subject:    RE: Encoding invalid XML characters
From:       "Robert Houben" <Robert.Houben () fusionware ! net>
Date:       2005-11-04 18:08:59
Message-ID: F10F2C39503B924B9A9F5BD08DC358E735F98B () pony ! fusionware ! net
[Download RAW message or body]

We have a set of middleware connectors for 30-year-old, non-relational
databases, including a number of connectors that will produce XML from
the data.  We encountered the same problem, and used the mixed content
model, almost identical to what David showed (with an attribute).  That
way we could take and concatenate all the text nodes if the user wanted
to ignore the "funny" character, or the user could reconstitute the
original string.  We provided them with sample code in Java, VB, C# and
a few other languages to show how to reconstitute these strings, and
also how to generate them themselves.  The Base64 method was avoided
because it made the output no longer "human-readable", which is one of
the *benefits* of XML! ;)

-----Original Message-----
From: David Sheldon [mailto:dave@earth.li] 
Sent: Friday, November 04, 2005 5:24 AM
To: j-users@xerces.apache.org
Subject: Re: Encoding invalid XML characters

On Fri, Nov 04, 2005 at 12:52:29PM -0000, Tom Sugden wrote:
> Hello,
> 
> Can anybody suggest the best approach for encoding invalid XML
characters
> into an XML document? For example, the Unicode character with the
> hexadecimal code 000C can be encoded into a Java character literal as
> follows:
> 
>     char c = '\u000C';
> 
> I tried encoding this character into an XML string using a standard
> character reference. For example:
> 
>     String s = "<tag>&#x000C;</tag>";

I think the easiest way is to make the tag element have the type
base64Binary.

This way the string "\u000C\u000A\n000D" would become

<tag>DAoN</tag>

however the string "Hello world" would become

<tag>SGVsbG8gd29ybGQ=</tag>

Alternatively you could use mixed content, and so "Hello \u000C" would
become:

<tag>Hello <char number="12"/></tag>

But this one would be harder to process by consumers of your system.

I wonder what solutions other people have for this problem.

David
-- 
"[Hackers] then only have to crack the password to take control"
  -- IT Week on a terrible Unix security flaw

---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-users-help@xerces.apache.org



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic