[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: xml:id preservation and RDF metadata support for ODF
From:       Ben Martin <monkeyiq () users ! sourceforge ! net>
Date:       2009-11-22 17:54:00
Message-ID: 1258912440.13235.131.camel () sam ! localdomain
[Download RAW message or body]

[Attachment #2 (multipart/signed)]


On Sat, 2009-11-21 at 04:58 +0100, Thorsten Zachmann wrote: 
> Hello,
> 
> On Friday 20 November 2009 09:27:50 Ben Martin wrote:
> > Abstract:
> >   The non-RDF question is about preserving xml:id on ODF across
> 	> load/save cycles.
> > 
> > Hi,
> >   I'm adding support for loading and saving RDF metadata in ODF
> > documents for kword. Of course, other ODF like documents like
> > spreadsheets which support RDF can also be added to the KOffice suite,
> > I've tried to make what I've done generic so it can be expanded as such.
> > I'm focusing on kword right now.
> > 
> >   I've mostly got external manifest.rdf based metadata working, but I'm
> > having a few problems with so called inline RDF. From the ODF
> > specification, inline RDF follows a slightly askew subset of RDFa. This
> > means that one can have xhtml:about, xhtml:property and xhtml:content
> > associated with an XML node and these are used to generate an RDF triple
> > during loading. I mostly have loading of such metadata working, with a
> > few little corner cases to address.
> > 
> >   The two problems I have found is that KOffice discards xml:id
> > attributes during load and that I need to track the node that inline RDF
> > gets generated from in order to store it back properly on document save.
> > These problems are somewhat intermixed because if xml:id's can be saved
> > for input documents, then saving the xhtml:about etc could also be done
> > in the same code.
> > 
> >   I thought I'd ask advise from the list because adding xml:id support
> > seems like a fairly intrusive change, and those currently more familiar
> > with the codebase might have some ideas as to how to best support it.
> > The xhtml:about attribute can appear on many XML elements including
> > text:p.
> > 
> > From my investigations, the pertinent code is in the class KoTextLoader.
> > Looking at KoTextLoader::loadParagraph() if there is a style applied
> > then KoParagraphStyle is used, then loadSpan() is called and, if the
> > text:p has no style and only a text child node cursor.insertText() is
> > called. This means there are many choices for where to store an xml:id
> > for the text:p, but none of them straightforward.
> 
> I don't think the KoParagraphStyle is a good place to store the stuff as it can 
> be reused for different paragraphs. Maybe the someone more familiar with KoText 
> can comment.

Yes, the biggest impediment is the mismatch between the ODF document
structure and the KOffice in memory representation of a document. As
mentioned, the text:p being dissolved on document load.

Another thought I had was possibly some container like object, or start
+end marker pairing, which has no effect to the document itself but
which would be able to store the xml:id and because it exists in the
document would expand, contract and be copy+pasted along with text.
Though there are all sorts of charming corner cases there too, clearly a
paste would need to modify the xml:id so it was still document-unique.

Upon further reflection, there are two distinct cases that I need to
address:

A) The xml:id is referred to from RDF which was either stored in
content.xml, manifest.rdf or one of the RDF/XML files referenced by
manifest.rdf. The link needs to be able to be traversed once the
document is loaded (so you can find out what the RDF is referring too),
and when saving if the xml:id changes, the RDF has to be updated to
s/old-xml:id/new-xml:id/g

B) There is some inline RDF which defines the object using element
scope. See OpenDocument-v1.2-part1-cd03.pdf, section 18.907 on page
#630. For example:
<text:p xhtml:about="uri-a" xhtml:property="waffle">
  The text between the text:p and end of it will form the object of the
RDF triple. This is tricky because the user might edit the text, cut,
paste or any number of other operations, including operations that might
apply a style across the text:p boundary itself.
</text:p>

For case (B), because text:p just adds text to the cursor.insertText(),
the exact start/end scope of the text:p goes missing. There doesn't
currently seem to be any correct place to attach any anchor saying "this
is uri-a" or "this is where uri-a starts and ends". In this case, the
about and property attributes would need to be saved back to a text:p
when the document is saved again.


> 
> > Looking at other elements, it seems that KoInline, KoShape for example
> > would also need some augmentation to store an xml:id if one was supplied
> > in the input ODF file so that the xml:id can be preserved on save.
> > 
> > I look forward to any thoughts or preferences on how the koffice hackers
> > would like to see the xml:id support added...
> > 
> 
> thanks for the very detailed explanation. I think there are two possible 
> solutions:
> 
> 1. Don't store the xml:id of the element but attach the meta data to it. This 
> could be done by having a class containing all the metadata for one element. 
> and store the data according to the element. 

I failed to mention in the initial email that according to the spec the
xml:id can be referenced from the RDF stored in manifest.rdf and others.
So the xml:id needs to be able to be resolved to the relevant chunk of
the loaded document if some of the RDF statements are to be used.

Currently, I store the loaded inline RDF (from content.xml) using the
same class that handles the manifest.rdf. By storing all RDF in one
central place it can be easily queried regardless of where it came from
and where it is to be saved back too. Using graph context, the
providence is tracked allowing proper placement during save. But of
course, when saving content.xml again, any RDF that was loaded from it
should go back into it...  ideally, the same XML elements that had the
RDF on them during load should have it on them during save, but some of
those like text:p do not have a 1-1 object in the loaded KOffice
docuemnt.



> 2. Have a map with the xml:id to metadata.

Having a map from xml:id to the metadata is easy enough, it is when I
want to get the document object for an xml:id that I run into trouble.

For example, an RDF triple (from manifest.rdf) might be like
[uri:example-element-2], [pkg:idref], "id_world"}

Where the content.xml file might contain
<text:meta xml:id="id_world">world</text:meta>
or
<text:p xml:id="id_world">world</text:p>


> 
> Storing the metadata internally has the benefit of no need to synchronize a 
> central map when e.g. a element is deleted/created ... That is also the way 
> koffice works most of the time, not keeping the old ids but generating new ones 
> on save. 

One issue here is that even the RDF from the external manifest.rdf file
can reference elements in content.xml by their xml:id. So for the
metadata from manifest.rdf to remain valid, the xml:id values must be
either preserved or the RDF triples modified at save time to use the
newly generated xml:id values in place of the old.

Aside from the xml:id thing, there is also the main issue I'm having
where I need some way to scope some ODF elements like text:p in a loaded
KOffice document so I have some C++ object to attach either xml:id
and/or metadata too. If anyone has any hints as to what I could use for
such a non displaying container that would be wonderful.

> Also koffice might handle stuff differently internally as saved in odf so 
> keeping the data in sync might be problematic.  This also means the xml:id is 
> not preserved, but it should not harm as the data is saved back again with a 
> new xml:id.
> Having  a separate map makes it easier to load save the data.
> 
> I hope that explains it a little bit more. Looking to here form you on what 
> you think.

I'll probably be on IRC in the .de afternoon until about 19:00.


> 
> Thorsten

["signature.asc" (application/pgp-signature)]

_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic