[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xalan-j-users
Subject:    HTML Serialization and Handling of Ampersands in HREF Attributes
From:       Klaus Malorny <Klaus.Malorny () knipp ! de>
Date:       2007-04-30 8:46:53
Message-ID: 4635ACFD.9070605 () knipp ! de
[Download RAW message or body]



Hi,

I got some problems using Xalan (and Xerces with the old 
org.apache.xml.serialize package as well) for serializing to the HTML format. It 
does *NOT* escape ampersands either as "&amp;" or &#38;" if it occurs in 
attributes designated to hold URLs, like the "href" attribute of the "a" 
element. Looking at the source code, it is clear that this is intentional. This 
puzzles me a lot. Due to a complaint of a customer I reviewed this issue and 
discovered that the HTML specifications clearly say that of course the 
ampersand, which is typically used to separate the form values, *MUST* be 
escaped in attributes containing URLs. I even discovered a respective note in 
the HTML 2.0 specification from the year 1995. Can anyone explain to me why this 
wrong handling exist and tell me whether this will be removed in future releases?

HTML 4.0: http://www.w3.org/TR/html401/appendix/notes.html section B.2.2
HTML 2.0: http://www.ietf.org/rfc/rfc1866.txt section 8.2.1 (page 46)

Sample:

test.xml:
- - - 8< - - -
<html>
<body>
<a href="a&amp;b" title="a&amp;b"/>
</body>
</html>
- - - 8< - - -

test.xsl:
- - - 8< - - -
<xsl:stylesheet
   version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:template match="/">
     <xsl:copy-of select="/"/>
   </xsl:template>

</xsl:stylesheet>
- - - 8< - - -

command arguments: -in test.xml -xsl test.xsl -HTML

output:
- - - 8< - - -
<html>

<body>

<a href="a&b" title="a&amp;b"></a>

</body>

</html>
- - - 8< - - -


Regards,

Klaus





[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic