'Re: Entire XML in memory?'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xmlbeans-dev
Subject:    Re: Entire XML in memory?
From:       robert burrell donkin <robertburrelldonkin () blueyonder ! co ! uk>
Date:       2004-01-18 12:29:09
Message-ID: EA584B8D-49B1-11D8-B471-003065DC754C () blueyonder ! co ! uk
[Download RAW message or body]

this seems (to me, at least) to be analogous with the arguments about 
OR mapping. (so hopefully i'll cut to the chase without rehashing those 
old arguments in a new context :)

if the aim is simple to push a given xml document into a database, then 
direct loading makes a lot more sense than using an intermediary object 
layer but this is really a SQL <-> xml mapping (rather than object <-> 
xml). IMHO this kind of capability would be better provided by a fast 
direct SAX application than a sophisticated object <-> xml mapper (such 
as xmlbeans).

but then again, i'd advocate using a xml database rather than a 
relational one so that there's no mapping needed :)

one reason why object layers are advocated is that it is easier to 
maintain complex business logic when it's encoded in this way than 
using procedural transaction scripts. IMHO xml beans should focus on 
this kind of application. IMHO applications who need apply complex 
business logic before persisting the results of the processing to the 
database are probably willing to pay the additional price for the xml 
-> object -> relational persistence. (that's not to say that this 
process shouldn't be optimized where possible.)

there is one particular kind of use case i have in mind where garbage 
collection issues become important. the bulk of the xml document being 
processed consists of many repeats of an element containing a subgraph.

for example:

<root>
	<header>...</header>
	<unit-of-work>
		...
	</unit-of-work>
	...
	<unit-of-work>
		...
	</unit-of-work>
<root>

let's say that the unit-of-work occurs 1000 times and contains a bulky 
subgraph. the processing consists of reading the <header> information 
and the processing each <unit-pf-work> independently in turn. (this has 
a a parellel case where a large document is being written to xml.)

in this case, it's not efficient to load the entire document into 
memory before processing it. in fact, the processing can begin as soon 
the first <unit-of-work has been read before the entire document has 
been read from io.

one solution (for which i use the term 'partial mapping') is to provide 
an event driven system whereby the mapper delivers an object model for 
each <unit-of-work> for processing as soon as it's ready.

the converse partial mapping case is writing a document with such a 
structure. here the question would be creating a big xml document with 
a similar repetitive structure where each unit-of-work is independent. 
here, you map each unit in term and then can free up the objects for 
garbage collection.

- robert

On 16 Jan 2004, at 00:25, Gerald B. Rosenberg wrote:

> I, too, am quite interested in this issue.
>
> My fundamental question is why should any of an XML exist outside of 
> the database in order to be worked upon.  Instead, why can't -- indeed 
> why shouldn't -- v2 be (also) architected to leverage the existing XML 
> handling capabilities of the hybrid databases, such as Oracle 9i and 
> DB2.  O9i in particular provides native/raw XPath, XQuery and cursor 
> capabilities using an SQL-centric syntax.  Using these native XML/DB 
> capabilities would preserve the indexing, concurrency control, access 
> rights, etc. provided by the database.  By instancing even only a 
> block of the XML outside of the database, it seems you have to lock 
> the corresponding portion of the database while providing your own 
> XPath/XQuery/XMLCursor implementations.  That does not seem to be an 
> efficient use of the database.
>
> Of course not all DBs have these native XML capabilities.  Just my 
> opinion that the current weight of development -- not scientific, just 
> a bunch of web browsing and reading -- is tending toward integrating 
> XML directly into the DBs, which is helped by XPath, XQuery and cursor 
> operations being relatively well-accepted/well-defined standards.
>
> Just as a reference point, my use-case scenario is a corpus of several 
> thousand XML documents with multiple concurrent processes performing a 
> progressive, essentially statistical analysis and update of the 
> documents.  Each process will need to reference multiple documents at 
> a time to perform group analysis.  At least one other process will be 
> adding documents to the corpus.  Each document is likely to be fairly 
> large: 100 - 300kB.  (If the documents were not fundamentally XML, 
> this would be a quite standard relational application.)
>
> XMLBeans looks to provide the java-centric programming model that is 
> most appropriate for my needs.  However, I am concerned with the 
> apparent absence of concurrency capabilities and failure to take 
> advantage of the DB indexing/performance capabilities.
>
> Can you please comment on how (if) v2 will address these issues.  Have 
> you yet established the back-end API.  If you have, I would really 
> like to take a look at it.
>
> Best,
> Gerald
>
>
>
>
>
> At 02:02 PM 1/15/2004, you wrote:
>
>> In v1, the document exists entirely in memory.  In v2, I am making the
>> xml store backend pluggable so that one may provide alternate backends
>> with different characteristics.  In addition to a high performance,
>> in-memory store (the default), I will make a memory mapped backend 
>> which
>> stores the XML in a memory mapped file and does not consume VM memory
>> proportional to the size of the XML being worked on which will allow 
>> one
>> to load very large documents.  I can imagine backends which
>> incrementally load parts of a document.
>>
>> This architecture is very much incomplete and in flux.  Unless you 
>> want
>> to be chasing my changes, I would give it a few months.
>>
>> - Eric
>>
>> -----Original Message-----
>> From: Matthew Bishop [mailto:matt@thebishops.org]
>> Sent: Thursday, January 15, 2004 1:17 PM
>> To: xmlbeans-dev@xml.apache.org
>> Subject: RE: Entire XML in memory?
>>
>> I'm interested in the answer to this one as well, though I think the
>> answer
>> is no, the whole doc is not in memory.  It may have different answers
>> depending on how one navigates the instance, however.
>>
>> My interest is using XMLBeans in a J2ME PersonalProfile application.
>> I've
>> just started this today, and my initial impression is that v2's design
>> goals
>> are more in line with what I need to do.  Without reading the entire
>> list,
>> can I get a feel for how far along v2 is?
>>
>> --
>>
>> Matt Bishop
>>
>>
>>
>> - 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
>> For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
>> Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/
>>
>>
>> - 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
>> For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
>> Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/
>
>
> - ---------------------------------------------------------------------
> To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
> Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/
>


- ---------------------------------------------------------------------
To unsubscribe, e-mail:   xmlbeans-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xmlbeans-dev-help@xml.apache.org
Apache XMLBeans Project -- URL: http://xml.apache.org/xmlbeans/

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic