[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xerces-j-dev
Subject:    Re: Questions about XML Parser for Java
From:       keshlam () us ! ibm ! com
Date:       2007-08-01 15:26:57
Message-ID: OF400E2049.66077C64-ON8525732A.00537E1D-8525732A.0054DD85 () lotus ! com
[Download RAW message or body]

>Would you be so kind as to provide me a rough estimate of the man hour=
s
that expended in developing the XML Parser

Probably not possible, but it's a significant number of man-years.

Xerces started off as an early prototype of IBM's XML4J parser, which w=
ent
through several complete redesigns and reimplementations, API changes,
changes in the validation scheme... Heck, the DOM implementation alone =
is
probably multiple man-years during that stage, since the first DOM
implementation was discarded in favor of one I wrote, which then underw=
ent
a lot of further evolution. Work on that was done across multiple IBM
groups from Tokyo to California to New York to Toronto to wherever. I
really doubt anyone was attempting to track total time investment.

And of course once Xerces hit Apache, and we started getting contributi=
ons
from the open source community, any pretense of time tracking would hav=
e
gone right out the window.

Could a parser be written in less time? Sure; a lot of the time was spe=
nt
in helping the standards to evolve, and a lot was spent in performance
tuning, and Xerces supports things that your particular application may=
 not
need (the downside of being a generally useful tool is that one has to
invest in being general.) And the requirements for an XML parser are be=
tter
understood these days. But writing a parser that you'll be happy using =
is
still not a trivial exercise; the devil really is in the details.


>We have noted that saving an XML file as an Excel file gets you an Exc=
el
file that seems to have been parsed in some
> manner. [...] I wonder if you would be willing to comment on the
differences between what XML4J would provide
>and what Excel provides for some particular XML file.

I'm sorry, but that question really doesn't make a lot of sense. It's l=
ike
asking what the difference is between a motor and a washing machine.

Excel is a particular application. It supports a particular XML-based
markup language as one of its file export/import syntaxes, and therefor=
e
must contain at least a limited XML serializer and parser. (May not be
fully general, since they know a priori exactly what kind of XML they
intend to generate and process.))

XML4J/Xerces is a general-purpose XML parser for invocation from
applications. It converts between XML syntax and the standard APIs for
working with XML (DOM, SAX, etc.), as well as performing validation aga=
inst
DTDs and/or schemas that describe the particular XML-based markup langu=
age
you are working with.. Xerces can be used as a building block for any
application which needs to read or write data represented in XML.



______________________________________
"... Three things see no end: A loop with exit code done wrong,
A semaphore untested, And the change that comes along. ..."
  -- "Threes" Rev 1.1 - Duane Elms / Leslie Fish
(http://www.ovff.org/pegasus/songs/threes-rev-11.html)=

[Attachment #3 (text/html)]

<html><body>
<p>&gt;<font face="Arial">Would you be so kind as to provide me a rough estimate of \
the man hours that expended in developing the XML Parser</font><br> <br>
Probably not possible, but it's a significant number of man-years. <br>
<br>
Xerces started off as an early prototype of IBM's XML4J parser, which went through \
several complete redesigns and reimplementations, API changes, changes in the \
validation scheme... Heck, the DOM implementation alone is probably multiple \
man-years during that stage, since the first DOM implementation was discarded in \
favor of one I wrote, which then underwent a lot of further evolution. Work on that \
was done across multiple IBM groups from Tokyo to California to New York to Toronto \
to wherever. I really doubt anyone was attempting to track total time investment.<br> \
<br> And of course once Xerces hit Apache, and we started getting contributions from \
the open source community, any pretense of time tracking would have gone right out \
the window.<br> <br>
Could a parser be written in less time? Sure; a lot of the time was spent in helping \
the standards to evolve, and a lot was spent in performance tuning, and Xerces \
supports things that your particular application may not need (the downside of being \
a generally useful tool is that one has to invest in being general.) And the \
requirements for an XML parser are better understood these days. But writing a parser \
that you'll be happy using is still not a trivial exercise; the devil really is in \
the details.<br> <br>
<br>
&gt;<font face="Arial">We have noted that saving an XML file as an Excel file gets \
you an Excel file that seems to have been parsed in some</font><br> <font \
face="Arial">&gt; manner. [...] I wonder if you would be willing to comment on the \
differences between what XML4J would provide </font><br> <font face="Arial">&gt;and \
what Excel provides for some particular XML file.</font><br> <br>
<font face="Arial">I'm sorry, but that question really doesn't make a lot of sense. \
It's like asking what the difference is between a motor and a washing \
machine.</font><br> <br>
<font face="Arial">Excel is a particular application. It supports a particular \
XML-based markup language as one of its file export/import syntaxes, and therefore \
must contain at least a limited XML serializer and parser. (May not be fully general, \
since they know a priori exactly what kind of XML they intend to generate and \
process.)</font><br> <br>
<font face="Arial">XML4J/Xerces is a general-purpose XML parser for invocation from \
applications. It converts between XML syntax and the standard APIs for working with \
XML (DOM, SAX, etc.), as well as performing validation against DTDs and/or schemas \
that describe the particular XML-based markup language you are working with.. Xerces \
can be used as a building block for any application which needs to read or write data \
represented in XML.</font><br> <br>
<br>
<br>
______________________________________<br>
&quot;... Three things see no end: A loop with exit code done wrong,<br>
A semaphore untested, And the change that comes along. ...&quot;<br>
  -- &quot;Threes&quot; Rev 1.1 - Duane Elms / Leslie Fish (<a \
href="http://www.ovff.org/pegasus/songs/threes-rev-11.html">http://www.ovff.org/pegasus/songs/threes-rev-11.html</a>)</body></html>




[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic