[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xalan-dev
Subject:    [jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of code
From:       "Peter De Maeyer (JIRA)" <jira () apache ! org>
Date:       2018-09-14 19:07:00
Message-ID: JIRA.13162262.1527374230000.78454.1536952020122 () Atlassian ! JIRA
[Download RAW message or body]


     [ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]

Peter De Maeyer updated XALANJ-2617:
------------------------------------
    Attachment: XALANJ-2617_java.patch
                XALANJ-2617_test.patch

> Serializer produces separately escaped surrogate pair instead of codepoint
> --------------------------------------------------------------------------
> 
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
> Issue Type: Bug
> Security Level: No security risk; visible to anyone(Ordinary problems in Xalan \
>                 projects.  Anybody can view the issue.) 
> Components: Serialization, Xalan
> Affects Versions: 2.7.1, 2.7.2
> Reporter: Daniel Kec
> Assignee: Steven J. Hathaway
> Priority: Major
> Attachments: JI9053942.java, XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, \
> XALANJ-2617_java.patch, XALANJ-2617_test.patch 
> 
> When trying to serialize XML with char consisting of unicode surogate char \
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates XML \
> string with escaped surogate pair separately, which makes XML unparseable. eg.: \
> SAXParseException; Character reference "&#55360" is an invalid XML character. It \
> looks like a bug introduced in the  XALANJ-2271 fix. 
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp \
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/ \
> repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. \
>                 JI9053942
> Character: 𠀋
> EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>&#131083;</a>
> ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>&#55360;&#56331;</a>
> [Fatal Error] :1:50: Character reference "&#
> {code}
> {code:java|title=But Xalan ver. 2.7.0 works OK}
> kec@phoebe:~/Downloads$ java -cp \
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/ \
> repository/xalan/xalan/2.7.0/xalan-2.7.0.jar:/home/kec/.m2/repository/xalan/serializer/2.7.0/serializer-2.7.0.jar:. \
>                 JI9053942
> Character: 𠀋
> EXPECTED: <?xml version="1.0" encoding="UTF-8"?><a>&#131083;</a>
> ACTUAL: <?xml version="1.0" encoding="UTF-8"?><a>&#131083;</a>
> ACTUAL PARSED CHAR 𠀋
> {code}
> {code:java|title=Test}
> String value = "\uD840\uDC0B"; 
> System.out.println("Character: " + value); 
> System.out.println("EXPECTED: <?xml version=\"1.0\" encoding=\"UTF-8\"?><a>&#" + \
> value.codePointAt(0) + ";</a>");  StringWriter writer = new StringWriter(); 
> final DocumentBuilder documentBuilder = \
> DocumentBuilderFactory.newInstance().newDocumentBuilder();  Document dom = \
> documentBuilder.newDocument();  final Element rootEl = dom.createElement("a"); 
> rootEl.setTextContent(value); 
> dom.appendChild(rootEl); 
> Transformer transformer = TransformerFactory.newInstance().newTransformer(); 
> transformer.transform(new DOMSource(dom), new \
> javax.xml.transform.stream.StreamResult(writer));  String xml = writer.toString(); 
> System.out.println(" ACTUAL: " + xml); 
> InputSource inputSource = new InputSource(); 
> inputSource.setCharacterStream(new StringReader(xml)); 
> System.out.println("ACTUAL PARSED CHAR " + \
> documentBuilder.parse(inputSource).getDocumentElement().getTextContent());  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xalan.apache.org
For additional commands, e-mail: dev-help@xalan.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic