[prev in list] [next in list] [prev in thread] [next in thread]
List: calligra-devel
Subject: Re: a new library for traversing odf files and a new export filter
From: Sebastian Sauer <mail () dipe ! org>
Date: 2013-03-26 9:52:59
Message-ID: 51516FFB.3010709 () dipe ! org
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
On 03/26/2013 04:32 PM, Sebastian Sauer wrote:
> On 03/26/2013 02:51 PM, Lassi Nieminen wrote:
> > Hola,
> >
> > On Mon, Mar 25, 2013 at 8:12 PM, Inge Wallin <inge@lysator.liu.se
> > <mailto:inge@lysator.liu.se>> wrote:
> >
> > On Monday, March 25, 2013 17:54:53 matus.uzak@gmail.com
> > <mailto:matus.uzak@gmail.com> wrote:
> > > Hi,
> > >
> > > sorry for not discussing earlier, but I did not have much free
> > time last
> > > two weeks.
> > >
> > > I think we should continue the parser type discussion in order
> > to also
> > > improve state of things in libmsooxml. What we have there is a
> > PULL
> > > parser. And I identified the following problems (Would be cool
> > is Lassi
> > > could check those):
> > >
> > > 1. OOXML sometimes requires us to run the parser twice at one
> > element in
> > > order to first collect selected information required to convert
> > the content
> > > of child elements.
> > >
> > > 2. There are situations when conversion of the 1st child of the
> > root
> > > element requires information from the last child of the root
> > element.
> >
> > It would be interesting to see some examples of these two issues.
> >
> >
> > As an example : in pptx files, in slides,
> > there can be text which is specified to use theme color lt1
> >
> > Don't remember the exact syntax, but something like
> > <p>
> > <rPr "color" = "lt1"/>
> > <r>Hejsan</r>
> > </p>
> >
> > Then as the last element of that slide there may or may not be
> > <clrMap "lt1" = "bg1" ...../> // or something similar
> >
> > Which means that lt1 should be interpreted to be bg1 for this
> > particular slide.
> > Currently what we're doing is that we first read the slide once,
> > skipping everything
> > except clrMap. Then we read the slide again (yay!) and start the real
> > conversion.
> >
> > There was something similar in xlsx filters too if my memory serves
> > me correctly.
> >
>
> See also somewhat related XmlWriteBuffer in
> filters/libmsooxml/MsooXmlUtils.h which is used "when information that
> has to be written in advance is based on XML elements parsed later.
> In such case the information cannot be saved in one pass" for OOXML=>ODF.
>
> In the case of XSLT I also remember that there where a problem with
> offset-references. Means something like (pseudo-xml):
>
> <style>
> <item>index 0</index>
> <item>index 1</index>
> <item>index 2</index>
> </style>
>
> <content>
> <content withStyleIndex="1"> // where 1 references to the second
> stlye-item
> <content>
>
> XSLT does iirc not allow such index-based reference-fetching making it
> needed to for-loop with counter over the <style> items all the time
> they are referenced. Super expensive and iirc not caching is done (my
> knowledge there is a few years old, so maybe that changed). A classic
> case where someone just likes to introduce a "caching concept" to read
> all the items at once, prepare them and access them later on direct by
> index from a style-container/mnager. OOXML makes quit a lot of use of
> such index-based references being a 1:1 port from C/C++ to XML.
Also somewhat related: Hard to say if caused by ugly design decisions
alone or driven by XSLT limitations (would think both) but years ago
when the CleverAge OOXML=>ODF converter sponsored by Microsoft appeared
during the OOXML ISO battle I investigated that code (for my diploma
thesis which had OOXML<=>ODF as subject). Lots of intermedia-steps (pre-
and post processing, multiple xslt runs).
Code is still available at:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/
Readme:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309&view=markup
The main converter lib:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/
The xsl's:
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/resources/oox2odf/
It wasn't that bad but I can confirm Rob Weir's blog back then that the
converter needs >10x longer then anything else and is a memory-monster.
>
>
>
> _______________________________________________
> calligra-devel mailing list
> calligra-devel@kde.org
> https://mail.kde.org/mailman/listinfo/calligra-devel
[Attachment #5 (text/html)]
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 03/26/2013 04:32 PM, Sebastian Sauer
wrote:<br>
</div>
<blockquote cite="mid:51516B4A.6050205@dipe.org" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">On 03/26/2013 02:51 PM, Lassi
Nieminen wrote:<br>
</div>
<blockquote
cite="mid:CABCpZwrTLGFWnrfTF6NaTeJgKtRqAcc46OC6UmAqu3ktA+mHKA@mail.gmail.com"
type="cite">Hola,<br>
<br>
<div class="gmail_quote">On Mon, Mar 25, 2013 at 8:12 PM, Inge
Wallin <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:inge@lysator.liu.se" \
target="_blank">inge@lysator.liu.se</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Monday, March 25, 2013 17:54:53 <a
moz-do-not-send="true"
href="mailto:matus.uzak@gmail.com">matus.uzak@gmail.com</a>
wrote:<br>
> Hi,<br>
><br>
> sorry for not discussing earlier, but I did not have
much free time last<br>
> two weeks.<br>
><br>
> I think we should continue the parser type discussion
in order to also<br>
> improve state of things in libmsooxml. What we \
have there is a PULL<br>
> parser. And I identified the following problems
(Would be cool is Lassi<br>
> could check those):<br>
><br>
> 1. OOXML sometimes requires us to run the parser
twice at one element in<br>
> order to first collect selected information required
to convert the content<br>
> of child elements.<br>
><br>
> 2. There are situations when conversion of the 1st
child of the root<br>
> element requires information from the last child of
the root element.<br>
<br>
</div>
It would be interesting to see some examples of these two
issues.</blockquote>
<div><br>
</div>
<div>As an example : in pptx files, in slides,</div>
<div>there can be text which is specified to use theme color
lt1</div>
<div><br>
</div>
<div>Don't remember the exact syntax, but something like</div>
<div><p></div>
<div><rPr "color" = "lt1"/></div>
<div><r>Hejsan</r></div>
<div></p></div>
<div> <br>
</div>
<div>Then as the last element of that slide there may or may
not be</div>
<div><clrMap "lt1" = "bg1" ...../> // or something
similar</div>
<div><br>
</div>
<div>Which means that lt1 should be interpreted to be bg1 for
this particular slide.</div>
<div>Currently what we're doing is that we first read the
slide once, skipping everything</div>
<div>except clrMap. Then we read the slide again (yay!) and
start the real conversion.</div>
<div><br>
</div>
<div>There was something similar in xlsx filters too if my
memory serves me correctly.</div>
<div><br>
</div>
</div>
</blockquote>
<br>
See also somewhat related XmlWriteBuffer in
filters/libmsooxml/MsooXmlUtils.h which is used "when information
that has to be written in advance is based on XML elements parsed
later. In such case the information cannot be saved in one \
pass" for OOXML=>ODF.<br>
<br>
In the case of XSLT I also remember that there where a problem
with offset-references. Means something like (pseudo-xml):<br>
<br>
<style><br>
<item>index 0</index><br>
<item>index 1</index><br>
<item>index 2</index><br>
</style><br>
<br>
<content><br>
<content withStyleIndex="1"> // where 1 references to \
the second stlye-item<br>
<content><br>
<br>
XSLT does iirc not allow such index-based reference-fetching
making it needed to for-loop with counter over the <style>
items all the time they are referenced. Super expensive and iirc
not caching is done (my knowledge there is a few years old, so
maybe that changed). A classic case where someone just likes to
introduce a "caching concept" to read all the items at once,
prepare them and access them later on direct by index from a
style-container/mnager. OOXML makes quit a lot of use of such
index-based references being a 1:1 port from C/C++ to XML.<br>
</blockquote>
<br>
Also somewhat related: Hard to say if caused by ugly design
decisions alone or driven by XSLT limitations (would think both) but
years ago when the CleverAge OOXML=>ODF converter sponsored by
Microsoft appeared during the OOXML ISO battle I investigated that
code (for my diploma thesis which had OOXML<=>ODF as subject).
Lots of intermedia-steps (pre- and post processing, multiple xslt
runs).<br>
<br>
Code is still available at:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/so \
urce/">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/</a><br>
Readme:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/so \
urce/Readme.txt?revision=5309&view=markup">http://odf-converter.svn.sour \
ceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309&view=markup</a><br>
The main converter lib:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/so \
urce/Common/OdfConverterLib/">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/</a><br>
The xsl's:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/so \
urce/Common/OdfConverterLib/resources/oox2odf/">http://odf-converter.svn.sou \
rceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/resources/oox2odf/</a><br>
<br>
It wasn't that bad but I can confirm Rob Weir's blog back then that
the converter needs >10x longer then anything else and is a
memory-monster.<br>
<br>
<blockquote cite="mid:51516B4A.6050205@dipe.org" type="cite"> <br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
calligra-devel mailing list
<a class="moz-txt-link-abbreviated" \
href="mailto:calligra-devel@kde.org">calligra-devel@kde.org</a> <a \
class="moz-txt-link-freetext" \
href="https://mail.kde.org/mailman/listinfo/calligra-devel">https://mail.kde.org/mailman/listinfo/calligra-devel</a>
</pre>
</blockquote>
<br>
</body>
</html>
_______________________________________________
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic