[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-c-dev
Subject:    [jira] Commented: (XERCESC-1369) Performance: improve end-of-line handling
From:       "Christian Will (JIRA)" <xerces-c-dev () xml ! apache ! org>
Date:       2005-03-23 17:45:29
Message-ID: 369017194.1111599929283.JavaMail.jira () ajax ! apache ! org
[Download RAW message or body]

     [ http://issues.apache.org/jira/browse/XERCESC-1369?page=comments#action_61415 ]
     
Christian Will commented on XERCESC-1369:
-----------------------------------------

Hi David,

it looks good.

Thanks,
Christian

> Performance: improve end-of-line handling
> -----------------------------------------
> 
> Key: XERCESC-1369
> URL: http://issues.apache.org/jira/browse/XERCESC-1369
> Project: Xerces-C++
> Type: Improvement
> Components: Miscellaneous
> Versions: 2.6.0
> Reporter: Christian Will
> Priority: Minor
> Attachments: XMLReader.cpp.patch, XMLReader.hpp.patch
> 
> We can improve the end-of-line handling by two steps.
> 1. We move the function XMLReader:handleEOL(...) from the header into the cpp file, \
> because the function is to big for inlining. 2. We create bit masks to avoid most \
> of the handleEOL calls. Here are two examples :
> a)
> We use the content information that our current character is a whitespace. The bit \
> mask selects all cases where we have to call handleEOL. if (isWhitespace(curCh))
> {
> //
> //  'curCh' is a whitespace(x20|x9|xD|xA), so we only can have
> //  end-of-line combinations with a leading chCR(xD) or chLF(xA)
> //
> //  100000 x20
> //  001001 x9
> //  001010 chLF
> //  001101 chCR
> //  -----------
> //  000110 == (chCR|chLF) & ~(0x9|0x20)
> //
> //  if the result of thelogical-& operation is
> //  true  : 'curCh' must be xA  or xD
> //  false : 'curCh' must be x20 or x9
> //
> if ( ( curCh & (chCR|chLF) & ~(0x9|0x20) ) == 0 )
> {
> fCurCol++;
> } else
> {
> handleEOL(curCh, false);
> }
> b)
> We have no content information so we have to test for all four possible start \
> characters. The bit masks selects only 128 cases (from before 63483) where we have \
> to call handleEOL. //
> // we can have end-of-line combinations with a leading
> // chCR(xD), chLF(xA), chNEL(x85), or chLineSeparator(x2028)
> //
> // 0000000000001101 chCR
> // 0000000000001010 chLF
> // 0000000010000101 chNEL
> // 0010000000101000 chLineSeparator
> // -----------------------
> // 1101111101010000 == ~(chCR|chLF|chNEL|chLineSeparator)
> //
> // if the result of the logical-& operation is
> // true  : 'curCh' can not be chCR, chLF, chNEL or chLineSeparator
> // false : 'curCh' can be chCR, chLF, chNEL or chLineSeparator
> //
> if ( chGotten & (XMLCh) ~(chCR|chLF|chNEL|chLineSeparator) )
> {
> fCurCol++;
> } else
> {
> handleEOL(chGotten, false);
> }
> I created bit masks for all (5) cases where we call handleEOL.
> I attached patch files against the latest cvs version.
> Regards,
> Christian Will

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic