[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xalan-dev
Subject:    [jira] [Commented] (XALANJ-2540) Very inefficient default behaviour for looking up DTMManager
From:       "Matthew Broadhead (JIRA)" <jira () apache ! org>
Date:       2018-05-23 9:21:00
Message-ID: JIRA.12510313.1308059788000.13320.1527067260491 () Atlassian ! JIRA
[Download RAW message or body]


    [ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin \
.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486962#comment-16486962 ] 

Matthew Broadhead commented on XALANJ-2540:
-------------------------------------------

[~garydgregory]  under your recommendation i have cloned 2.7.1 as that is the highest \
maintenance release i can see.   i have greped for 2.7.3 but cannot see that \
mentioned in any of the files

There is no pom.xml or anything so it looks like manual building?

[~msahyoun] thanks i have looked in ObjectFactory and see where it doing \
lookUpFactoryClassName().   do you think  it is possible to  cache the result into a \
Singleton for future requests?   or  might  this cause clashes?

I could submit a patch for Singleton  suggestion but I am not sure how to build and \
deploy the project

> Very inefficient default behaviour for looking up DTMManager
> ------------------------------------------------------------
> 
> Key: XALANJ-2540
> URL: https://issues.apache.org/jira/browse/XALANJ-2540
> Project: XalanJ2
> Issue Type: Improvement
> Security Level: No security risk; visible to anyone(Ordinary problems in Xalan \
>                 projects.  Anybody can view the issue.) 
> Components: DTM, XPath
> Affects Versions: 2.7.1, 2.7
> Reporter: Lukas Eder
> Priority: Major
> 
> I have analysed an issue that has been bothering me for some time. When executing \
> XPath evaluations, it looks like a very significant amount of time is spent in the \
> initialisation of the XPathContext. I have asked this question on Stack Overflow \
> and answered it myself: \
> http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
>  I think the default behaviour of 
> org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and \
> should be improved, statically. I imagine, it is unlikely that this configuration \
> is going to change once classes have been loaded. Hence, the fallback lookup of \
> META-INF/service/org.apache.xml.dtm.DTMManager should only be done once. For \
>                 reference, here's the question and answer again in JIRA:
> ----
> I have come to an astonishing conclusion that this:
> Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
> String result = ((Element) e).getTextContent();
> Seems to be an incredible 100x faster than this:
> // Accounts for 30%, can be cached
> XPathFactory factory = XPathFactory.newInstance();
> // Negligible
> XPath xpath = factory.newXPath();
> // Accounts for 70% (caching a compiled expression doesn't change much...)
> String result = (String) xpath.evaluate(
> "//SomeElementName", document, XPathConstants.STRING);
> I'm using the JVM's default implementation of JAXP:
> org.apache.xpath.jaxp.XPathFactoryImpl
> org.apache.xpath.jaxp.XPathImpl
> I'm really confused, because it's easy to see how JAXP could optimise the above \
> XPath query to actually execute a simple getElementsByTagName() instead. But it \
> doesn't seem to do that. This problem is limited to around 5-6 frequently used \
> XPath calls, that are abstracted and hidden by an API. Those queries involve simple \
> paths (e.g. /a/b/c, no variables, conditions) against an always available DOM \
> Document only. So, if an optimisation can be done, it will be quite easy to \
>                 achieve.
> ----
> I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to \
> identify the big major problem in \
> org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() It can be seen that every \
> one of the 10k test XPath evaluations led to the classloader trying to lookup the \
> DTMManager instance in some sort of default configuration. This configuration is \
> not loaded into memory but accessed every time. Furthermore, this access seems to \
> be protected by a lock on the ObjectFactory.class itself. When the access fails (by \
> default), then the configuration is loaded from the xalan.jar file's \
> META-INF/service/org.apache.xml.dtm.DTMManager configuration file. Every time!:
> Fortunately, this behaviour can be overridden by specifying a JVM parameter like \
>                 this:
> -Dorg.apache.xml.dtm.DTMManager=
> org.apache.xml.dtm.ref.DTMManagerDefault
> or
> -Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
> com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
> So here's a performance improvement overview for 10k consecutive XPath evaluations \
> of //SomeNodeName against a 90k XML file (measured with System.nanoTime(): measured \
>                 library        : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen \
>                 1.1.3   
> --------------------------------------------------------------------------------
> without optimisation    :     10400ms |      4717ms |              |     25500ms
> reusing XPathFactory    :      5995ms |      2829ms |              |
> reusing XPath           :      5900ms |      2890ms |              |
> reusing XPathExpression :      5800ms |      2915ms |      16000ms |     25000ms
> adding the JVM param    :      1163ms |       761ms |        n/a   |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@xalan.apache.org
For additional commands, e-mail: dev-help@xalan.apache.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic