[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xmlbeans-dev
Subject:    [jira] Created: (XMLBEANS-135) bad handling of embeded CDATA
From:       "Martin Hamel (JIRA)" <xmlbeans-dev () xml ! apache ! org>
Date:       2005-03-31 15:09:33
Message-ID: 1127098608.1112281773637.JavaMail.jira () ajax ! apache ! org
[Download RAW message or body]

bad handling of embeded CDATA
-----------------------------

         Key: XMLBEANS-135
         URL: http://issues.apache.org/jira/browse/XMLBEANS-135
     Project: XMLBeans
        Type: Bug
    Versions: Version 1.0.3, Version 1.0.4, Version 2 Beta 1    
 Environment: I arrived to it on windows with jdk 1.4.2. 
    Reporter: Martin Hamel


I have a case of bad xml. It is an envelope document that includes another 
document. The parser expect the enclosed document to be in CDATA. The problem 
is that the second document now include a third document which is also 
expected to be a CDATA. 


I create document A with an XMLBean. I put it has a text element of document B 
after I transformed Document A to a string with xmlText(). I then do the same 
with document B by putting it in Document C. Everything works well and 
automatically and it creates CDATA everytime it needs to.

        //fragment
 XmlOptions options = new XmlOptions();
        options.setSavePrettyPrint();
        Field field = getAssessmentFields().addNewField();
        field.setFieldName("AssessmentContent");
        field.setFieldValue(answersDocument.xmlText(options));
  ..


The problem is that on the second escaping the CDATA end ([[>)is escaped to 
"&gt;". The SAX parser that read all this (Xalan) just can't do it. Also, the 
specification says that there should not be any CDATA containing a CDATA.

Here is the modification I made for embeded CDATA. Do you think that would be 
worty of beeing included?

here is the entitizeContent method in Saver.java:

        Pattern cdataPattern = Pattern.compile("CDATA");


        private void entitizeContent ( )
        {
            if (_lastEmitCch == 0)
                return;

            int i = _lastEmitIn;
            final int n = _buf.length;

            boolean hasOutOfRange = false;
            
            int count = 0;
            for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
            {                
                char ch = _buf[ i ];

                if (ch == '<' || ch == '&')
                    count++;
                else if (isBadChar( ch ))
                    hasOutOfRange = true;

                if (++i == n)
                    i = 0;
            }

            if (count == 0 && !hasOutOfRange)
                return;

            i = _lastEmitIn;

            //
            // Heuristic for knowing when to save out stuff as a CDATA.
            //
            
            // Well check if we have a cdata in the buffer.
            // If we do, we won't nest another one.
            CharBuffer charBuffer = CharBuffer.wrap(_buf);
            boolean hasCDATA = cdataPattern.matcher(charBuffer).find();            

            if (_lastEmitCch > 32 && count > 5 &&
                    count * 100 / _lastEmitCch > 1 && !hasCDATA)
              { 
                boolean lastWasBracket = _buf[ i ] == ']';

                i = replace( i, "<![CDATA[" + _buf[ i ] );

                boolean secondToLastWasBracket = lastWasBracket;

                lastWasBracket = _buf[ i ] == ']';

                if (++i == _buf.length)
                    i = 0;

                for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
                {
                    char ch = _buf[ i ];

                    if (ch == '>' && secondToLastWasBracket && lastWasBracket)
                        i = replace( i, "&gt;" );
                    else if (isBadChar( ch ))
                        i = replace( i, "?" );
                    else
                        i++;

                    secondToLastWasBracket = lastWasBracket;
                    lastWasBracket = ch == ']';

                    if (i == _buf.length)
                        i = 0;
                }

                emit( "]]>" );
            }
            else
            {
                for ( int cch = _lastEmitCch ; cch > 0 ; cch-- )
                {
                    char ch = _buf[ i ];

                    if (ch == '<')
                        i = replace( i, "&lt;" );
                    else if (hasCDATA && ch == '>')
                        i = replace(i, "&gt;");
                    else if (ch == '&')
                        i = replace( i, "&amp;" );
                    else if (isBadChar( ch ))
                        i = replace( i, "?" );
                    else
                        i++;

                    if (i == _buf.length)
                        i = 0;
                }
            }
        }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: dev-help@xmlbeans.apache.org

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic