[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xmlbeans-dev
Subject:    [jira] Created: (XMLBEANS-295) setLoadStripWhitespace() api errors
From:       "David RR Webber (JIRA)" <xmlbeans-dev () xml ! apache ! org>
Date:       2006-11-02 17:11:17
Message-ID: 5682671.1162487477077.JavaMail.root () brutus
[Download RAW message or body]

setLoadStripWhitespace() api errors when trimming white space characters
------------------------------------------------------------------------

                 Key: XMLBEANS-295
                 URL: http://issues.apache.org/jira/browse/XMLBEANS-295
             Project: XMLBeans
          Issue Type: Bug
          Components: Validator
    Affects Versions: Version 2.2.1
         Environment: SunOS 5.9 and Microsoft Windows XP SP2, Java 1.4.2
            Reporter: David RR Webber
             Fix For: TBD


Situation Summary

We implemented to production using the setLoadStripWhitespace() api in XMLBeans.  \
After some days we started getting intermittent failures from occasional XML \
transactions.

After a week of investigation we realized that flushText() method itself was the \
cause - having eliminated all other factors.  Specifically we have determined that \
character strings containing the & character result in spaces being stripped \
immediately after the & - e.g. <company>B & H Photo</company> becomes <company>B &H \
Photo</company>.

We realize that there is a patch available for & processing - and we are currently \
testing that to see if is cures the problem relating to & \
(http://issues.apache.org/jira/browse/XMLBEANS-274 )

However we are also seeing an intermittent problem in our UNIX environment associated \
with colon : (could be other characters as well - we do not have definitive list). \
What we found is intermittent spaces being trimmed in various fields that do not \
contain "&" (the original XMLBEAN-274 bug reported).  This one we cannot reproduce in \
our Windows development systems - but it is happening intermittently in SunOS. 

Again space either immediately following the colon or in subsequent string is \
stripped - for tokenized elements - e.g.  <urgent>Yes: Y</urgent>  becomes \
<urgent>Yes:Y</urgent> and then the object returns NULL value because this is then \
not a valid allowed value for the tokenized list. Similarly <location>USA: United \
States</location> became <location>USA: UnitedStates</location>.  We suspect that \
there is a prior character before the colon that might be triggering this behaviour \
but we have not yet determined when or how.  This illustrates how complex this issue \
is in terms of the current XMLBeans implementation approach.

Analysis

We have looked at how and where XMLBeans is doing the white space trim during the \
unmarshalling of the XML content.  When it detects a white space - it then invokes a \
stripRight() method loop.  We are not convinced that this is architecturally sound at \
the point it is employed - it is leading to complexity and obviously a lot of edge \
conditions and some combinations of characters that are not handled consistently and \
correctly.

Our preferred approach would be to defer the white space trim until \
post-unmarshalling - so the initial process can treat the XML content "as is" between \
the angle brackets - then once extracted - then apply the trim().  At that point a \
simple java string object trim() can be employed.  This could be provided as an \
alternate method call to the current setLoadStripWhitespace() api that would iterate \
through the entire structure of objects instead of the original XML stream.  The only \
check that would be necessary is if the XML markup itself set the \
xml:space="preserve" attribute option for an element object - in which case the \
trim() would be automatically skipped for that content object item.  What is \
happening right now is that the existing flushText() method is mixing up XML markup \
and the content - instead there needs to be a clear separation between the element \
angle brackets and attribute quotes - and the content itself.

Again the caveat maybe here - maybe the current approach is intended to be prior to \
error checking on tokenized lists - to prevent failure there due to extra spaces?   \
However - even so it is not cleanly enough separated - and clearly again it would be \
simpler to use a java string class trim method within the tokenized evaluation itself \
on just the string.

Suggested Solution

Re-factor the current white space setLoadStripWhitespace() api to delay string \
manipulation on content until after unpacking of the content and XML markup - instead \
of prior-to as is currently happening.  This makes for much simpler white space trim \
logic (can simply use the Java string class method) that does not need to look for \
markup artifacts as well.

We are not clear on who owns this particular feature in XMLBeans - whether they are \
currently available to assist on this - but we would be prepared to work with the \
team to develop a better solution here.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: \
                http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: dev-help@xmlbeans.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic