[prev in list] [next in list] [prev in thread] [next in thread]
List: calligra-devel
Subject: Re: a new library for traversing odf files and a new export filter
From: Sebastian Sauer <mail () dipe ! org>
Date: 2013-03-26 9:52:59
Message-ID: 51516FFB.3010709 () dipe ! org
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
On 03/26/2013 04:32 PM, Sebastian Sauer wrote:
> On 03/26/2013 02:51 PM, Lassi Nieminen wrote:
> > Hola,
> >
> > On Mon, Mar 25, 2013 at 8:12 PM, Inge Wallin <inge@lysator.liu.se
> > <mailto:inge@lysator.liu.se>> wrote:
> >
> > On Monday, March 25, 2013 17:54:53 matus.uzak@gmail.com
> > <mailto:matus.uzak@gmail.com> wrote:
> > > Hi,
> > >
> > > sorry for not discussing earlier, but I did not have much free
> > time last
> > > two weeks.
> > >
> > > I think we should continue the parser type discussion in order
> > to also
> > > improve state of things in libmsooxml. What we have there is a
> > PULL
> > > parser. And I identified the following problems (Would be cool
> > is Lassi
> > > could check those):
> > >
> > > 1. OOXML sometimes requires us to run the parser twice at one
> > element in
> > > order to first collect selected information required to convert
> > the content
> > > of child elements.
> > >
> > > 2. There are situations when conversion of the 1st child of the
> > root
> > > element requires information from the last child of the root
> > element.
> >
> > It would be interesting to see some examples of these two issues.
> >
> >
> > As an example : in pptx files, in slides,
> > there can be text which is specified to use theme color lt1
> >
> > Don't remember the exact syntax, but something like
> > <p>
> > <rPr "color" = "lt1"/>
> > <r>Hejsan</r>
> > </p>
> >
> > Then as the last element of that slide there may or may not be
> > <clrMap "lt1" = "bg1" ...../> // or something similar
> >
> > Which means that lt1 should be interpreted to be bg1 for this
> > particular slide.
> > Currently what we're doing is that we first read the slide once,
> > skipping everything
> > except clrMap. Then we read the slide again (yay!) and start the real
> > conversion.
> >
> > There was something similar in xlsx filters too if my memory serves
> > me correctly.
> >
>
> See also somewhat related XmlWriteBuffer in
> filters/libmsooxml/MsooXmlUtils.h which is used "when information that
> has to be written in advance is based on XML elements parsed later.
> In such case the information cannot be saved in one pass" for OOXML=>ODF.
>
> In the case of XSLT I also remember that there where a problem with
> offset-references. Means something like (pseudo-xml):
>
> <style>
> <item>index 0</index>
> <item>index 1</index>
> <item>index 2</index>
> </style>
>
> <content>
> <content withStyleIndex="1"> // where 1 references to the second
> stlye-item
> <content>
>
> XSLT does iirc not allow such index-based reference-fetching making it
> needed to for-loop with counter over the <style> items all the time
> they are referenced. Super expensive and iirc not caching is done (my
> knowledge there is a few years old, so maybe that changed). A classic
> case where someone just likes to introduce a "caching concept" to read
> all the items at once, prepare them and access them later on direct by
> index from a style-container/mnager. OOXML makes quit a lot of use of
> such index-based references being a 1:1 port from C/C++ to XML.
Also somewhat related: Hard to say if caused by ugly design decisions
alone or driven by XSLT limitations (would think both) but years ago
when the CleverAge OOXML=>ODF converter sponsored by Microsoft appeared
during the OOXML ISO battle I investigated that code (for my diploma
thesis which had OOXML<=>ODF as subject). Lots of intermedia-steps (pre-
and post processing, multiple xslt runs).
Code is still available at:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/
Readme:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309&view=markup
The main converter lib:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/
The xsl's:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/resources/oox2odf/
It wasn't that bad but I can confirm Rob Weir's blog back then that the
converter needs >10x longer then anything else and is a memory-monster.
>
>
>
> _______________________________________________
> calligra-devel mailing list
> calligra-devel@kde.org
> https://mail.kde.org/mailman/listinfo/calligra-devel
[Attachment #5 (text/html)]
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 03/26/2013 04:32 PM, Sebastian Sauer
wrote:<br>
</div>
<blockquote cite="mid:51516B4A.6050205@dipe.org" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 03/26/2013 02:51 PM, Lassi
Nieminen wrote:<br>
</div>
<blockquote
cite="mid:CABCpZwrTLGFWnrfTF6NaTeJgKtRqAcc46OC6UmAqu3ktA+mHKA@mail.gmail.com"
type="cite">Hola,<br>
<br>
<div class="gmail_quote">On Mon, Mar 25, 2013 at 8:12 PM, Inge
Wallin <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:inge@lysator.liu.se" \
target="_blank">inge@lysator.liu.se</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Monday, March 25, 2013 17:54:53 <a
moz-do-not-send="true"
href="mailto:matus.uzak@gmail.com">matus.uzak@gmail.com</a>
wrote:<br>
> Hi,<br>
><br>
> sorry for not discussing earlier, but I did not have
much free time last<br>
> two weeks.<br>
><br>
> I think we should continue the parser type discussion
in order to also<br>
> improve state of things in libmsooxml. What we have
there is a PULL<br>
> parser. And I identified the following problems
(Would be cool is Lassi<br>
> could check those):<br>
><br>
> 1. OOXML sometimes requires us to run the parser
twice at one element in<br>
> order to first collect selected information required
to convert the content<br>
> of child elements.<br>
><br>
> 2. There are situations when conversion of the 1st
child of the root<br>
> element requires information from the last child of
the root element.<br>
<br>
</div>
It would be interesting to see some examples of these two
issues.</blockquote>
<div><br>
</div>
<div>As an example : in pptx files, in slides,</div>
<div>there can be text which is specified to use theme color
lt1</div>
<div><br>
</div>
<div>Don't remember the exact syntax, but something like</div>
<div><p></div>
<div><rPr "color" = "lt1"/></div>
<div><r>Hejsan</r></div>
<div></p></div>
<div> <br>
</div>
<div>Then as the last element of that slide there may or may
not be</div>
<div><clrMap "lt1" = "bg1" ...../> // or something
similar</div>
<div><br>
</div>
<div>Which means that lt1 should be interpreted to be bg1 for
this particular slide.</div>
<div>Currently what we're doing is that we first read the
slide once, skipping everything</div>
<div>except clrMap. Then we read the slide again (yay!) and
start the real conversion.</div>
<div><br>
</div>
<div>There was something similar in xlsx filters too if my
memory serves me correctly.</div>
<div><br>
</div>
</div>
</blockquote>
<br>
See also somewhat related XmlWriteBuffer in
filters/libmsooxml/MsooXmlUtils.h which is used "when information
that has to be written in advance is based on XML elements parsed
later. In such case the information cannot be saved in one pass"
for OOXML=>ODF.<br>
<br>
In the case of XSLT I also remember that there where a problem
with offset-references. Means something like (pseudo-xml):<br>
<br>
<style><br>
<item>index 0</index><br>
<item>index 1</index><br>
<item>index 2</index><br>
</style><br>
<br>
<content><br>
<content withStyleIndex="1"> // where 1 references to the
second stlye-item<br>
<content><br>
<br>
XSLT does iirc not allow such index-based reference-fetching
making it needed to for-loop with counter over the <style>
items all the time they are referenced. Super expensive and iirc
not caching is done (my knowledge there is a few years old, so
maybe that changed). A classic case where someone just likes to
introduce a "caching concept" to read all the items at once,
prepare them and access them later on direct by index from a
style-container/mnager. OOXML makes quit a lot of use of such
index-based references being a 1:1 port from C/C++ to XML.<br>
</blockquote>
<br>
Also somewhat related: Hard to say if caused by ugly design
decisions alone or driven by XSLT limitations (would think both) but
years ago when the CleverAge OOXML=>ODF converter sponsored by
Microsoft appeared during the OOXML ISO battle I investigated that
code (for my diploma thesis which had OOXML<=>ODF as subject).
Lots of intermedia-steps (pre- and post processing, multiple xslt
runs).<br>
<br>
Code is still available at:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/</a><br>
Readme:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revi \
sion=5309&view=markup">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309&view=markup</a><br>
The main converter lib:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConve \
rterLib/">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/</a><br>
The xsl's:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConve \
rterLib/resources/oox2odf/">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/resources/oox2odf/</a><br>
<br>
It wasn't that bad but I can confirm Rob Weir's blog back then that
the converter needs >10x longer then anything else and is a
memory-monster.<br>
<br>
<blockquote cite="mid:51516B4A.6050205@dipe.org" type="cite"> <br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
calligra-devel mailing list
<a class="moz-txt-link-abbreviated" \
href="mailto:calligra-devel@kde.org">calligra-devel@kde.org</a> <a \
class="moz-txt-link-freetext" \
href="https://mail.kde.org/mailman/listinfo/calligra-devel">https://mail.kde.org/mailman/listinfo/calligra-devel</a>
</pre>
</blockquote>
<br>
</body>
</html>
_______________________________________________
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic