[prev in list] [next in list] [prev in thread] [next in thread] 

List:       calligra-devel
Subject:    Re: a new library for traversing odf files and a new export filter
From:       Sebastian Sauer <mail () dipe ! org>
Date:       2013-03-26 9:52:59
Message-ID: 51516FFB.3010709 () dipe ! org
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


On 03/26/2013 04:32 PM, Sebastian Sauer wrote:
> On 03/26/2013 02:51 PM, Lassi Nieminen wrote:
> > Hola,
> > 
> > On Mon, Mar 25, 2013 at 8:12 PM, Inge Wallin <inge@lysator.liu.se 
> > <mailto:inge@lysator.liu.se>> wrote:
> > 
> > On Monday, March 25, 2013 17:54:53 matus.uzak@gmail.com
> > <mailto:matus.uzak@gmail.com> wrote:
> > > Hi,
> > > 
> > > sorry for not discussing earlier, but I did not have much free
> > time last
> > > two weeks.
> > > 
> > > I think we should continue the parser type discussion in order
> > to also
> > > improve state of things in libmsooxml.  What we have there is a
> > PULL
> > > parser. And I identified the following problems (Would be cool
> > is Lassi
> > > could check those):
> > > 
> > > 1. OOXML sometimes requires us to run the parser twice at one
> > element in
> > > order to first collect selected information required to convert
> > the content
> > > of child elements.
> > > 
> > > 2. There are situations when conversion of the 1st child of the
> > root
> > > element requires information from the last child of the root
> > element.
> > 
> > It would be interesting to see some examples of these two issues.
> > 
> > 
> > As an example : in pptx files, in slides,
> > there can be text which is specified to use theme color lt1
> > 
> > Don't remember the exact syntax, but something like
> > <p>
> > <rPr "color" = "lt1"/>
> > <r>Hejsan</r>
> > </p>
> > 
> > Then as the last element of that slide there may or may not be
> > <clrMap "lt1" = "bg1" ...../> // or something similar
> > 
> > Which means that lt1 should be interpreted to be bg1 for this 
> > particular slide.
> > Currently what we're doing is that we first read the slide once, 
> > skipping everything
> > except clrMap. Then we read the slide again (yay!) and start the real 
> > conversion.
> > 
> > There was something similar in xlsx filters too if my memory serves 
> > me correctly.
> > 
> 
> See also somewhat related XmlWriteBuffer in 
> filters/libmsooxml/MsooXmlUtils.h which is used "when information that 
> has to be written in advance is based on XML elements parsed later.  
> In such case the information cannot be saved in one pass" for OOXML=>ODF.
> 
> In the case of XSLT I also remember that there where a problem with 
> offset-references. Means something like (pseudo-xml):
> 
> <style>
> <item>index 0</index>
> <item>index 1</index>
> <item>index 2</index>
> </style>
> 
> <content>
> <content withStyleIndex="1"> // where 1 references to the second 
> stlye-item
> <content>
> 
> XSLT does iirc not allow such index-based reference-fetching making it 
> needed to for-loop with counter over the <style> items all the time 
> they are referenced. Super expensive and iirc not caching is done (my 
> knowledge there is a few years old, so maybe that changed). A classic 
> case where someone just likes to introduce a "caching concept" to read 
> all the items at once, prepare them and access them later on direct by 
> index from a style-container/mnager. OOXML makes quit a lot of use of 
> such index-based references being a 1:1 port from C/C++ to XML.

Also somewhat related: Hard to say if caused by ugly design decisions 
alone or driven by XSLT limitations (would think both) but years ago 
when the CleverAge OOXML=>ODF converter sponsored by Microsoft appeared 
during the OOXML ISO battle I investigated that code (for my diploma 
thesis which had OOXML<=>ODF as subject). Lots of intermedia-steps (pre- 
and post processing, multiple xslt runs).

Code is still available at: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/
Readme: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309&view=markup
 The main converter lib: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/
 The xsl's: 
http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/resources/oox2odf/


It wasn't that bad but I can confirm Rob Weir's blog back then that the 
converter needs >10x longer then anything else and is a memory-monster.

> 
> 
> 
> _______________________________________________
> calligra-devel mailing list
> calligra-devel@kde.org
> https://mail.kde.org/mailman/listinfo/calligra-devel


[Attachment #5 (text/html)]

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 03/26/2013 04:32 PM, Sebastian Sauer
      wrote:<br>
    </div>
    <blockquote cite="mid:51516B4A.6050205@dipe.org" type="cite">
      <meta content="text/html; charset=ISO-8859-1"
        http-equiv="Content-Type">
      <div class="moz-cite-prefix">On 03/26/2013 02:51 PM, Lassi
        Nieminen wrote:<br>
      </div>
      <blockquote
cite="mid:CABCpZwrTLGFWnrfTF6NaTeJgKtRqAcc46OC6UmAqu3ktA+mHKA@mail.gmail.com"
        type="cite">Hola,<br>
        <br>
        <div class="gmail_quote">On Mon, Mar 25, 2013 at 8:12 PM, Inge
          Wallin <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:inge@lysator.liu.se" \
target="_blank">inge@lysator.liu.se</a>&gt;</span>  wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div class="im">On Monday, March 25, 2013 17:54:53 <a
                moz-do-not-send="true"
                href="mailto:matus.uzak@gmail.com">matus.uzak@gmail.com</a>
              wrote:<br>
              &gt; Hi,<br>
              &gt;<br>
              &gt; sorry for not discussing earlier, but I did not have
              much free time last<br>
              &gt; two weeks.<br>
              &gt;<br>
              &gt; I think we should continue the parser type discussion
              in order to also<br>
              &gt; improve state of things in libmsooxml. &nbsp;What we have
              there is a PULL<br>
              &gt; parser. And I identified the following problems
              (Would be cool is Lassi<br>
              &gt; could check those):<br>
              &gt;<br>
              &gt; 1. OOXML sometimes requires us to run the parser
              twice at one element in<br>
              &gt; order to first collect selected information required
              to convert the content<br>
              &gt; of child elements.<br>
              &gt;<br>
              &gt; 2. There are situations when conversion of the 1st
              child of the root<br>
              &gt; element requires information from the last child of
              the root element.<br>
              <br>
            </div>
            It would be interesting to see some examples of these two
            issues.</blockquote>
          <div><br>
          </div>
          <div>As an example : in pptx files, in slides,</div>
          <div>there can be text which is specified to use theme color
            lt1</div>
          <div><br>
          </div>
          <div>Don't remember the exact syntax, but something like</div>
          <div>&lt;p&gt;</div>
          <div>&lt;rPr "color" = "lt1"/&gt;</div>
          <div>&lt;r&gt;Hejsan&lt;/r&gt;</div>
          <div>&lt;/p&gt;</div>
          <div> <br>
          </div>
          <div>Then as the last element of that slide there may or may
            not be</div>
          <div>&lt;clrMap "lt1" = "bg1" ...../&gt; // or something
            similar</div>
          <div><br>
          </div>
          <div>Which means that lt1 should be interpreted to be bg1 for
            this particular slide.</div>
          <div>Currently what we're doing is that we first read the
            slide once, skipping everything</div>
          <div>except clrMap. Then we read the slide again (yay!) and
            start the real conversion.</div>
          <div><br>
          </div>
          <div>There was something similar in xlsx filters too if my
            memory serves me correctly.</div>
          <div><br>
          </div>
        </div>
      </blockquote>
      <br>
      See also somewhat related XmlWriteBuffer in
      filters/libmsooxml/MsooXmlUtils.h which is used "when information
      that has to be written in advance is based on XML elements parsed
      later.&nbsp; In such case the information cannot be saved in one pass"
      for OOXML=&gt;ODF.<br>
      <br>
      In the case of XSLT I also remember that there where a problem
      with offset-references. Means something like (pseudo-xml):<br>
      <br>
      &lt;style&gt;<br>
      &nbsp; &lt;item&gt;index 0&lt;/index&gt;<br>
      &nbsp; &lt;item&gt;index 1&lt;/index&gt;<br>
      &nbsp; &lt;item&gt;index 2&lt;/index&gt;<br>
      &lt;/style&gt;<br>
      <br>
      &lt;content&gt;<br>
      &nbsp; &lt;content withStyleIndex="1"&gt; // where 1 references to the
      second stlye-item<br>
      &lt;content&gt;<br>
      <br>
      XSLT does iirc not allow such index-based reference-fetching
      making it needed to for-loop with counter over the &lt;style&gt;
      items all the time they are referenced. Super expensive and iirc
      not caching is done (my knowledge there is a few years old, so
      maybe that changed). A classic case where someone just likes to
      introduce a "caching concept" to read all the items at once,
      prepare them and access them later on direct by index from a
      style-container/mnager. OOXML makes quit a lot of use of such
      index-based references being a 1:1 port from C/C++ to XML.<br>
    </blockquote>
    <br>
    Also somewhat related: Hard to say if caused by ugly design
    decisions alone or driven by XSLT limitations (would think both) but
    years ago when the CleverAge OOXML=&gt;ODF converter sponsored by
    Microsoft appeared during the OOXML ISO battle I investigated that
    code (for my diploma thesis which had OOXML&lt;=&gt;ODF as subject).
    Lots of intermedia-steps (pre- and post processing, multiple xslt
    runs).<br>
    <br>
    Code is still available at:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/</a><br>
  Readme:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readm \
e.txt?revision=5309&amp;view=markup">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Readme.txt?revision=5309&amp;view=markup</a><br>
  The main converter lib:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Commo \
n/OdfConverterLib/">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/</a><br>
  The xsl's:
<a class="moz-txt-link-freetext" \
href="http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Commo \
n/OdfConverterLib/resources/oox2odf/">http://odf-converter.svn.sourceforge.net/viewvc/odf-converter/trunk/source/Common/OdfConverterLib/resources/oox2odf/</a><br>
  <br>
    It wasn't that bad but I can confirm Rob Weir's blog back then that
    the converter needs &gt;10x longer then anything else and is a
    memory-monster.<br>
    <br>
    <blockquote cite="mid:51516B4A.6050205@dipe.org" type="cite"> <br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
calligra-devel mailing list
<a class="moz-txt-link-abbreviated" \
href="mailto:calligra-devel@kde.org">calligra-devel@kde.org</a> <a \
class="moz-txt-link-freetext" \
href="https://mail.kde.org/mailman/listinfo/calligra-devel">https://mail.kde.org/mailman/listinfo/calligra-devel</a>
 </pre>
    </blockquote>
    <br>
  </body>
</html>



_______________________________________________
calligra-devel mailing list
calligra-devel@kde.org
https://mail.kde.org/mailman/listinfo/calligra-devel


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic