'[jira] Assigned: (XMLBEANS-295) setLoadStripWhitespace() api errors'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xmlbeans-dev
Subject:    [jira] Assigned: (XMLBEANS-295) setLoadStripWhitespace() api errors
From:       "Wing Yew Poon (JIRA)" <xmlbeans-dev () xml ! apache ! org>
Date:       2009-10-25 3:15:59
Message-ID: 1968372368.1256440559521.JavaMail.jira () brutus
[Download RAW message or body]


     [ https://issues.apache.org/jira/browse/XMLBEANS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel \
]

Wing Yew Poon reassigned XMLBEANS-295:
--------------------------------------

    Assignee: Cezar Andrei

> setLoadStripWhitespace() api errors when trimming white space characters
> ------------------------------------------------------------------------
> 
> Key: XMLBEANS-295
> URL: https://issues.apache.org/jira/browse/XMLBEANS-295
> Project: XMLBeans
> Issue Type: Bug
> Components: Validator
> Affects Versions: Version 2.2.1
> Environment: SunOS 5.9 and Microsoft Windows XP SP2, Java 1.4.2
> Reporter: David RR Webber
> Assignee: Cezar Andrei
> Fix For: TBD
> 
> 
> Situation Summary
> We implemented to production using the setLoadStripWhitespace() api in XMLBeans.  \
> After some days we started getting intermittent failures from occasional XML \
> transactions. After a week of investigation we realized that flushText() method \
> itself was the cause - having eliminated all other factors.  Specifically we have \
> determined that character strings containing the & character result in spaces being \
> stripped immediately after the & - e.g. <company>B & H Photo</company> becomes \
> <company>B &H Photo</company>. We realize that there is a patch available for & \
> processing - and we are currently testing that to see if is cures the problem \
> relating to & (http://issues.apache.org/jira/browse/XMLBEANS-274 ) However we are \
> also seeing an intermittent problem in our UNIX environment associated with colon : \
> (could be other characters as well - we do not have definitive list). What we found \
> is intermittent spaces being trimmed in various fields that do not contain "&" (the \
> original XMLBEAN-274 bug reported).  This one we cannot reproduce in our Windows \
> development systems - but it is happening intermittently in SunOS.  Again space \
> either immediately following the colon or in subsequent string is stripped - for \
> tokenized elements - e.g.  <urgent>Yes: Y</urgent>  becomes <urgent>Yes:Y</urgent> \
> and then the object returns NULL value because this is then not a valid allowed \
> value for the tokenized list. Similarly <location>USA: United States</location> \
> became <location>USA: UnitedStates</location>.  We suspect that there is a prior \
> character before the colon that might be triggering this behaviour but we have not \
> yet determined when or how.  This illustrates how complex this issue is in terms of \
> the current XMLBeans implementation approach. Analysis
> We have looked at how and where XMLBeans is doing the white space trim during the \
> unmarshalling of the XML content.  When it detects a white space - it then invokes \
> a stripRight() method loop.  We are not convinced that this is architecturally \
> sound at the point it is employed - it is leading to complexity and obviously a lot \
> of edge conditions and some combinations of characters that are not handled \
> consistently and correctly. Our preferred approach would be to defer the white \
> space trim until post-unmarshalling - so the initial process can treat the XML \
> content "as is" between the angle brackets - then once extracted - then apply the \
> trim().  At that point a simple java string object trim() can be employed.  This \
> could be provided as an alternate method call to the current \
> setLoadStripWhitespace() api that would iterate through the entire structure of \
> objects instead of the original XML stream.  The only check that would be necessary \
> is if the XML markup itself set the xml:space="preserve" attribute option for an \
> element object - in which case the trim() would be automatically skipped for that \
> content object item.  What is happening right now is that the existing flushText() \
> method is mixing up XML markup and the content - instead there needs to be a clear \
> separation between the element angle brackets and attribute quotes - and the \
> content itself. Again the caveat maybe here - maybe the current approach is \
> intended to be prior to error checking on tokenized lists - to prevent failure \
> there due to extra spaces?   However - even so it is not cleanly enough separated - \
> and clearly again it would be simpler to use a java string class trim method within \
> the tokenized evaluation itself on just the string. Suggested Solution
> Re-factor the current white space setLoadStripWhitespace() api to delay string \
> manipulation on content until after unpacking of the content and XML markup - \
> instead of prior-to as is currently happening.  This makes for much simpler white \
> space trim logic (can simply use the Java string class method) that does not need \
> to look for markup artifacts as well. We are not clear on who owns this particular \
> feature in XMLBeans - whether they are currently available to assist on this - but \
> we would be prepared to work with the team to develop a better solution here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xmlbeans.apache.org
For additional commands, e-mail: dev-help@xmlbeans.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic