[prev in list] [next in list] [prev in thread] [next in thread]
List: xalan-dev
Subject: [jira] [Created] (XALANJ-2540) Very inefficient default behaviour
From: "Lukas Eder (JIRA)" <xalan-dev () xml ! apache ! org>
Date: 2011-06-14 13:56:47
Message-ID: 146744869.2665.1308059807321.JavaMail.tomcat () hel ! zones ! apache ! org
[Download RAW message or body]
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone (Ordinary problems in Xalan \
projects. Anybody can view the issue.) Components: DTM, XPath
Affects Versions: 2.7, 2.7.1
Reporter: Lukas Eder
I have analysed an issue that has been bothering me for some time. When executing \
XPath evaluations, it looks like a very significant amount of time is spent in the \
initialisation of the XPathContext. I have asked this question on Stack Overflow and \
answered it myself:
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and \
should be improved, statically. I imagine, it is unlikely that this configuration is \
going to change once classes have been loaded. Hence, the fallback lookup of \
META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
For reference, here's the question and answer again in JIRA:
----
I have come to an astonishing conclusion that this:
Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
Seems to be an incredible 100x faster than this:
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
I'm using the JVM's default implementation of JAXP:
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above XPath \
query to actually execute a simple getElementsByTagName() instead. But it doesn't \
seem to do that. This problem is limited to around 5-6 frequently used XPath calls, \
that are abstracted and hidden by an API. Those queries involve simple paths (e.g. \
/a/b/c, no variables, conditions) against an always available DOM Document only. So, \
if an optimisation can be done, it will be quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to \
identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the \
classloader trying to lookup the DTMManager instance in some sort of default \
configuration. This configuration is not loaded into memory but accessed every time. \
Furthermore, this access seems to be protected by a lock on the ObjectFactory.class \
itself. When the access fails (by default), then the configuration is loaded from the \
xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
configuration file. Every time!:
Fortunately, this behaviour can be overridden by specifying a JVM parameter like \
this:
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
So here's a performance improvement overview for 10k consecutive XPath evaluations of \
//SomeNodeName against a 90k XML file (measured with System.nanoTime():
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: xalan-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xalan-dev-help@xml.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic