[prev in list] [next in list] [prev in thread] [next in thread]
List: xmlbeans-dev
Subject: [jira] Assigned: (XMLBEANS-295) setLoadStripWhitespace() api errors
From: "Wing Yew Poon (JIRA)" <xmlbeans-dev () xml ! apache ! org>
Date: 2009-10-25 3:15:59
Message-ID: 1968372368.1256440559521.JavaMail.jira () brutus
[Download RAW message or body]
[ https://issues.apache.org/jira/browse/XMLBEANS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]
Wing Yew Poon reassigned XMLBEANS-295:
--------------------------------------
Assignee: Cezar Andrei
> setLoadStripWhitespace() api errors when trimming white space characters
> ------------------------------------------------------------------------
>
> Key: XMLBEANS-295
> URL: https://issues.apache.org/jira/browse/XMLBEANS-295
> Project: XMLBeans
> Issue Type: Bug
> Components: Validator
> Affects Versions: Version 2.2.1
> Environment: SunOS 5.9 and Microsoft Windows XP SP2, Java 1.4.2
> Reporter: David RR Webber
> Assignee: Cezar Andrei
> Fix For: TBD
>
>
> Situation Summary
> We implemented to production using the setLoadStripWhitespace() api in XMLBeans. \
> After some days we started getting intermittent failures from occasional XML \
> transactions. After a week of investigation we realized that flushText() method \
> itself was the cause - having eliminated all other factors. Specifically we have \
> determined that character strings containing the & character result in spaces being \
> stripped immediately after the & - e.g. <company>B & H Photo</company> becomes \
> <company>B &H Photo</company>. We realize that there is a patch available for & \
> processing - and we are currently testing that to see if is cures the problem \
> relating to & (http://issues.apache.org/jira/browse/XMLBEANS-274 ) However we are \
> also seeing an intermittent problem in our UNIX environment associated with colon : \
> (could be other characters as well - we do not have definitive list). What we found \
> is intermittent spaces being trimmed in various fields that do not contain "&" (the \
> original XMLBEAN-274 bug reported). This one we cannot reproduce in our Windows \
> development systems - but it is happening intermittently in SunOS. Again space \
> either immediately following the colon or in subsequent string is stripped - for \
> tokenized elements - e.g. <urgent>Yes: Y</urgent> becomes <urgent>Yes:Y</urgent> \
> and then the object returns NULL value because this is then not a valid allowed \
> value for the tokenized list. Similarly <location>USA: United States</location> \
> became <location>USA: UnitedStates</location>. We suspect that there is a prior \
> character before the colon that might be triggering this behaviour but we have not \
> yet determined when or how. This illustrates how complex this issue is in terms of \
> the current XMLBeans implementation approach. Analysis
> We have looked at how and where XMLBeans is doing the white space trim during the \
> unmarshalling of the XML content. When it detects a white space - it then invokes \
> a stripRight() method loop. We are not convinced that this is architecturally \
> sound at the point it is employed - it is leading to complexity and obviously a lot \
> of edge conditions and some combinations of characters that are not handled \
> consistently and correctly. Our preferred approach would be to defer the white \
> space trim until post-unmarshalling - so the initial process can treat the XML \
> content "as is" between the angle brackets - then once extracted - then apply the \
> trim(). At that point a simple java string object trim() can be employed. This \
> could be provided as an alternate method call to the current \
> setLoadStripWhitespace() api that would iterate through the entire structure of \
> objects instead of the original XML stream. The only check that would be necessary \
> is if the XML markup itself set the xml:space="preserve" attribute option for an \
> element object - in which case the trim() would be automatically skipped for that \
> content object item. What is happening right now is that the existing flushText() \
> method is mixing up XML markup and the content - instead there needs to be a clear \
> separation between the element angle brackets and attribute quotes - and the \
> content itself. Again the caveat maybe here - maybe the current approach is \
> intended to be prior to error checking on tokenized lists - to prevent failure \
> there due to extra spaces? However - even so it is not cleanly enough separated - \
> and clearly again it would be simpler to use a java string class trim method within \
> the tokenized evaluation itself on just the string. Suggested Solution
> Re-factor the current white space setLoadStripWhitespace() api to delay string \
> manipulation on content until after unpacking of the content and XML markup - \
> instead of prior-to as is currently happening. This makes for much simpler white \
> space trim logic (can simply use the Java string class method) that does not need \
> to look for markup artifacts as well. We are not clear on who owns this particular \
> feature in XMLBeans - whether they are currently available to assist on this - but \
> we would be prepared to work with the team to develop a better solution here.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: dev-help@xmlbeans.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic