[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xalan-dev
Subject:    [jira] [Created] (XALANJ-2540) Very inefficient default behaviour
From:       "Lukas Eder (JIRA)" <xalan-dev () xml ! apache ! org>
Date:       2011-06-14 13:56:47
Message-ID: 146744869.2665.1308059807321.JavaMail.tomcat () hel ! zones ! apache ! org
[Download RAW message or body]

Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------

                 Key: XALANJ-2540
                 URL: https://issues.apache.org/jira/browse/XALANJ-2540
             Project: XalanJ2
          Issue Type: Improvement
      Security Level: No security risk; visible to anyone (Ordinary problems in Xalan \
projects.  Anybody can view the issue.)  Components: DTM, XPath
    Affects Versions: 2.7, 2.7.1
            Reporter: Lukas Eder


I have analysed an issue that has been bothering me for some time. When executing \
XPath evaluations, it looks like a very significant amount of time is spent in the \
initialisation of the XPathContext. I have asked this question on Stack Overflow and \
answered it myself:

http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance


I think the default behaviour of 
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and \
should be improved, statically. I imagine, it is unlikely that this configuration is \
going to change once classes have been loaded. Hence, the fallback lookup of \
META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.

For reference, here's the question and answer again in JIRA:

----

I have come to an astonishing conclusion that this:

    Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
    String result = ((Element) e).getTextContent();

Seems to be an incredible 100x faster than this:

    // Accounts for 30%, can be cached
    XPathFactory factory = XPathFactory.newInstance();

    // Negligible
    XPath xpath = factory.newXPath();

    // Accounts for 70% (caching a compiled expression doesn't change much...)
    String result = (String) xpath.evaluate(
      "//SomeElementName", document, XPathConstants.STRING);

I'm using the JVM's default implementation of JAXP:

org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl

I'm really confused, because it's easy to see how JAXP could optimise the above XPath \
query to actually execute a simple getElementsByTagName() instead. But it doesn't \
seem to do that. This problem is limited to around 5-6 frequently used XPath calls, \
that are abstracted and hidden by an API. Those queries involve simple paths (e.g. \
/a/b/c, no variables, conditions) against an always available DOM Document only. So, \
if an optimisation can be done, it will be quite easy to achieve.

----

I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to \
identify the big major problem in

org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()

It can be seen that every one of the 10k test XPath evaluations led to the \
classloader trying to lookup the DTMManager instance in some sort of default \
configuration. This configuration is not loaded into memory but accessed every time. \
Furthermore, this access seems to be protected by a lock on the ObjectFactory.class \
itself. When the access fails (by default), then the configuration is loaded from the \
xalan.jar file's

META-INF/service/org.apache.xml.dtm.DTMManager

configuration file. Every time!:

Fortunately, this behaviour can be overridden by specifying a JVM parameter like \
this:

-Dorg.apache.xml.dtm.DTMManager=
  org.apache.xml.dtm.ref.DTMManagerDefault

or

-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
  com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault

So here's a performance improvement overview for 10k consecutive XPath evaluations of \
//SomeNodeName against a 90k XML file (measured with System.nanoTime():

measured library        : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3   
--------------------------------------------------------------------------------
without optimisation    :     10400ms |      4717ms |              |     25500ms
reusing XPathFactory    :      5995ms |      2829ms |              |
reusing XPath           :      5900ms |      2890ms |              |
reusing XPathExpression :      5800ms |      2915ms |      16000ms |     25000ms
adding the JVM param    :      1163ms |       761ms |        n/a   |

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic